Dual Distributional Verb Sense Disambiguation With Small Corpora And Machine Readable Dictionaries

نویسندگان

Jeong-Mi Cho

Jungyun Seo

Gilchang Kim

چکیده

This paper presents a system for unsupervised verb sense disambiguation using small corpus and a machine-readable dictionary (MRD) in Korean. The system learns a set of typical usages listed in the MRD usage examples for each of the senses of a polysemous verb in the MRD definitions using verb-object co-occurrences acquired from the corpus. This paper concentrates on the problem of data sparseness in two ways. First, extending word similarity measures from direct co-occurrences to cooccurrences of co-occurred words, we compute the word similarities using not co-occurred words but cooccurred clusters. Second, we acquire IS-A relations of nouns from the MRD definitions. It is possible to cluster the nouns roughly by the identification of the IS-A relationship. By these methods, two words may be considered similar even if they do not share any words. Experiments show that this method can learn from very small training corpus, achieving over 86% correct disambiguation performance without a restriction of word's senses. 1 I n t r o d u c t i o n Much recent research in the field of natural language processing has focused on an empirical, corpusbased approach, and the high accuracy achieved by a corpus-based approach to part-of-speech tagging and parsing has inspired similar approaches to word sense disambiguation. For the most successful approaches to such problems, correctly annotated materials are crucial for training learning-based algorithms. Regardless of whether or not learning is involved, the prevailing evaluation methodology requires correct test sets in order to rigorously assess the quality of algorithms and compare their performance. This seems to require manual tagging of the training corpus with appropriate sense for each occurrence of an ambiguous word. However, in marked contrast to annotated training material for part-of-speech tagging, (a) there is no coarse-level set of sense distinctions widely agreed upon (whereas * This work was supported in part by KISTEP for Soft Science Research project. headword : open 2 sense usage examples open Open the window a bit, please. He opened the door for me to come in. Open the box. start Our chairman opened the conference by welcoming new delegates/ Open a public meeting. Table 1: The entry of open(vt.) in OALD part-of-speech tag sets tend to differ in the detail); (b) sense annotation has a comparatively high error rate (Miller, personal communication, reports an upper bound for human annotators of around 90~ for ambiguous cases, using a non-blind evaluation method that may make even this estimate overly optimistic(Resnik, 1997)); (c) in conclusion, a sense-tagged corpus large enough to achieve broad coverage and high accuracy word sense disambiguation is not available at present. This paper describes an unsupervised sense disambiguation system using a POS-tagged corpus and a machine-readable dict ionary (MRD). The system we propose circumvents the need for the sense-tagged corpus by using MRD's usage examples as the sense-tagged examples. Because these usage examples show the natural examples for headword's each sense, we can acquire useful sense disambiguation context from them. For example, open has several senses and usage examples for its each sense listed in a dictionary as shown in Table 1. The words within usage examples window, door, box, con#fence, and meeting are useful context for sense disambiguation of open. Another problem that is common for much corpusbased work is data sparseness, and the problem especially severe for work in WSD. First, enormous amounts of text are required to ensure that all senses of a polysemous word are represented, given the vast disparity in frequency among senses. In addition, the many possible co-occurrences for a given polysemous word are unlikely to be found in even a very large corpus, or they occur too infrequently to be significant. In this paper, we propose two methods

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

6 Unsupervised corpus - based methods for WSD

This chapter focuses on unsupervised corpus-based methods of word sense discrimination that are knowledge-lean, and do not rely on external knowledge sources such as machine readable dictionaries, concept hierarchies, or sense-tagged text. They do not assign sense tags to words; rather, they discriminate among word meanings based on information found in unannotated corpora. This chapter reviews...

متن کامل

6 Unsupervised Corpus - Based Methods for WSD 6 . 1

متن کامل

Kannada Word Sense Disambiguation for Machine Translation

Polysemous Words can have more than one distinct meaning. Word sense disambiguation (WSD) is the ability to identify the exact meaning of such polysemous words in context in a computational manner. WSD is considered as an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problem in Artificial Intelligence. In this paper, we propose an Integrated Kanna...

متن کامل

Disambiguation of Compound Noun Translations Extracted from Bilingual Comparable Corpora

Bilingual machine readable dictionaries are important and indispensable information resources for cross-language information retrieval, machine translation, and so on. In this paper, we describe a bilingual dictionary acquisition system which extracts translations from non-parallel but comparable corpora of a specific academic domain and disambiguates the extracted translations. We also experim...

متن کامل

Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries

In this paper, we describe a means for automatically building very large neural networks (VLNNs) from definition texts in machine-readable dictionaries, and demonstrate the use of these networks for word sense disambiguation. Our method brings together two earlier, independent approaches to word sense disambiguation: the use of machine-readable dictionaries and connectionnist models. The automa...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Dual Distributional Verb Sense Disambiguation With Small Corpora And Machine Readable Dictionaries

نویسندگان

چکیده

منابع مشابه

6 Unsupervised corpus - based methods for WSD

6 Unsupervised Corpus - Based Methods for WSD 6 . 1

Kannada Word Sense Disambiguation for Machine Translation

Disambiguation of Compound Noun Translations Extracted from Bilingual Comparable Corpora

Word Sense Disambiguation with Very Large Neural Networks Extracted from Machine Readable Dictionaries

عنوان ژورنال:

اشتراک گذاری